540 research outputs found

    Techniques for improving clustering and association rules mining from very large transactional databases

    Get PDF
    Clustering and association rules mining are two core data mining tasks that have been actively studied by data mining community for nearly two decades. Though many clustering and association rules mining algorithms have been developed, no algorithm is better than others on all aspects, such as accuracy, efficiency, scalability, adaptability and memory usage. While more efficient and effective algorithms need to be developed for handling the large-scale and complex stored datasets, emerging applications where data takes the form of streams pose new challenges for the data mining community. The existing techniques and algorithms for static stored databases cannot be applied to the data streams directly. They need to be extended or modified, or new methods need to be developed to process the data streams.In this thesis, algorithms have been developed for improving efficiency and accuracy of clustering and association rules mining on very large, high dimensional, high cardinality, sparse transactional databases and data streams.A new similarity measure suitable for clustering transactional data is defined and an incremental clustering algorithm, INCLUS, is proposed using this similarity measure. The algorithm only scans the database once and produces clusters based on the user’s expectations of similarities between transactions in a cluster, which is controlled by the user input parameters, a similarity threshold and a support threshold. Intensive testing has been performed to evaluate the effectiveness, efficiency, scalability and order insensitiveness of the algorithm.To extend INCLUS for transactional data streams, an equal-width time window model and an elastic time window model are proposed that allow mining of clustering changes in evolving data streams. The minimal width of the window is determined by the minimum clustering granularity for a particular application. Two algorithms, CluStream_EQ and CluStream_EL, based on the equal-width window model and the elastic window model respectively, are developed by incorporating these models into INCLUS. Each algorithm consists of an online micro-clustering component and an offline macro-clustering component. The online component writes summary statistics of a data stream to the disk, and the offline components uses those summaries and other user input to discover changes in a data stream. The effectiveness and scalability of the algorithms are evaluated by experiments.This thesis also looks into sampling techniques that can improve efficiency of mining association rules in a very large transactional database. The sample size is derived based on the binomial distribution and central limit theorem. The sample size used is smaller than that based on Chernoff Bounds, but still provides the same approximation guarantees. The accuracy of the proposed sampling approach is theoretically analyzed and its effectiveness is experimentally evaluated on both dense and sparse datasets.Applications of stratified sampling for association rules mining is also explored in this thesis. The database is first partitioned into strata based on the length of transactions, and simple random sampling is then performed on each stratum. The total sample size is determined by a formula derived in this thesis and the sample size for each stratum is proportionate to the size of the stratum. The accuracy of transaction size based stratified sampling is experimentally compared with that of random sampling.The thesis concludes with a summary of significant contributions and some pointers for further work

    Two dimensional semiconductors with possible high room temperature mobility

    Get PDF
    We calculated the longitudinal acoustic phonon limited electron mobility of 14 two dimensional semiconductors with composition of MX2_2, where M (= Mo, W, Sn, Hf, Zr and Pt) is the transition metal, and X is S, Se and Te. We treated the scattering matrix by deformation potential approximation. We found that out of the 14 compounds, MoTe2_2, HfSe2_2 and HfTe2_2, are promising regarding to the possible high mobility and finite band gap. The phonon limited mobility can be above 2500 cm2^2V1^{-1}s1^{-1} at room temperature

    Quantitative estimation of joint roughness coefficient using statistical parameters

    Get PDF

    Relationship between joint roughness coefficient and fractal dimension of rock fracture surfaces

    Get PDF
    AbstractNumerous empirical equations have been proposed to estimate the joint roughness coefficient (JRC) of a rock fracture based on its fractal dimension (D). A detailed review is made on these various methods, along with a discussion about their usability and limitations. It is found that great variation exists among the previously proposed equations. This is partially because of the limited number of data points used to derive these equations, and partially because of the inconsistency in the methods for determining D. The 10 standard profiles on which most previous equations are based are probably too few for deriving a reliable correlation. Different methods may give different values of D for a given profile. The h–L method is updated in this study to avoid subjectivity involved in identifying the high-order asperities. The compass-walking, box-counting and the updated h–L method are employed to examine a larger population of 112 rock joint profiles. Based on these results, a new set of empirical equations are proposed, which indicate that the fractal dimension estimated from compass-walking and the updated h–L method closely relate to JRC, whereas the values estimated from box-counting do not relate as closely

    Research on Situation of Chinese Excellent Athletes Disability Mutual Insurance

    Get PDF
    Athlete as a career has the characteristics of high risk. There are about 70% Chinese athletes who are in different degree of disability. As a consequence, the issue of disability security for athletes has always been taken seriously. Through researching on the development of Chinese excellent athletes disability mutual insurance, collecting and analysing the data of disability mutual insurance from 2007 to 2012 which was published by China Sports Foundation, it is obvious that relatively small number of athletes participate in Chinese excellent athletes disability mutual Insurance, and the premium is quite low. Research also shows that the degree of athletic disability is concentrated in lower rating, disability parts of different sports item have differences, and the insurance payments amount remains stable with a fact that male athletes are more than female ones

    MFRL-BI: Design of a Model-free Reinforcement Learning Process Control Scheme by Using Bayesian Inference

    Full text link
    Design of process control scheme is critical for quality assurance to reduce variations in manufacturing systems. Taking semiconductor manufacturing as an example, extensive literature focuses on control optimization based on certain process models (usually linear models), which are obtained by experiments before a manufacturing process starts. However, in real applications, pre-defined models may not be accurate, especially for a complex manufacturing system. To tackle model inaccuracy, we propose a model-free reinforcement learning (MFRL) approach to conduct experiments and optimize control simultaneously according to real-time data. Specifically, we design a novel MFRL control scheme by updating the distribution of disturbances using Bayesian inference to reduce their large variations during manufacturing processes. As a result, the proposed MFRL controller is demonstrated to perform well in a nonlinear chemical mechanical planarization (CMP) process when the process model is unknown. Theoretical properties are also guaranteed when disturbances are additive. The numerical studies also demonstrate the effectiveness and efficiency of our methodology.Comment: 31 pages, 7 figures, and 3 table

    High-efficiency and positivity-preserving stabilized SAV methods for gradient flows

    Full text link
    The scalar auxiliary variable (SAV)-type methods are very popular techniques for solving various nonlinear dissipative systems. Compared to the semi-implicit method, the baseline SAV method can keep a modified energy dissipation law but doubles the computational cost. The general SAV approach does not add additional computation but needs to solve a semi-implicit solution in advance, which may potentially compromise the accuracy and stability. In this paper, we construct a novel first- and second-order unconditional energy stable and positivity-preserving stabilized SAV (PS-SAV) schemes for L2L^2 and H1H^{-1} gradient flows. The constructed schemes can reduce nearly half computational cost of the baseline SAV method and preserve its accuracy and stability simultaneously. Meanwhile, the introduced auxiliary variable is always positive while the baseline SAV cannot guarantee this positivity-preserving property. Unconditionally energy dissipation laws are derived for the proposed numerical schemes. We also establish a rigorous error analysis of the first-order scheme for the Allen-Cahn type equation in l(0,T;H1(Ω))l^{\infty}(0,T; H^1(\Omega) ) norm. In addition we propose an energy optimization technique to optimize the modified energy close to the original energy. Several interesting numerical examples are presented to demonstrate the accuracy and effectiveness of the proposed methods

    Research on the functional semantic field of spatial orientation in russian and chinese languages

    Get PDF
    According to Bondarko's functional grammar theory, combined with the corpus of the Russian State Corpus and the BCC Corpus of Peking University, this paper discusses the language expression means of each subfield of the functional semantic field of spatial orientational category in Russian and Chinese Languages, and constructs the structure of directional functional semantic field. The research results of this paper will help Chinese students better grasp the grammatical structure of Russian spatial direction prepositions. At the same time, This paper systematically compares and analyzes the characteristics of the expressions about the directional functional semantic field in the two languages, provides theoretical guidance for Chinese college students to learn Russian, and provides theoretical support for teachers engaged in Russian teaching

    Evaluation of a Drought Tolerance Native Grass: \u3cem\u3eCleistogenes songorica\u3c/em\u3e for the Turf Use Purpose

    Get PDF
    Water deficit is one of the most important factors to restrict growth of turf grass, especially in northwest China where water available for landscape irrigation is increasingly limited. Use of drought-tolerant turfgrass species or cultivars is one of the strategies used to reduce water utilization and irrigation requirement (Nielsen and Stewart 1990). Recent study showed that regionally adapted native grass species are worth investigating as suitable alternatives to the conventional turfgrasses in many applications (Mark et al. 2011), and several native grass species are suitable for low-maintenance turfgrass use has also been reported (Mintenko et al. 2002). Awnless cleistogenes (Cleistogenes songorica) is a drought tolerance perennial grass native to the northwest desert grasslands of China. It grows well with a mean annual rainfall of 100 to 200 mm and is tolerant to the winter temperatures as low as -40oC. A series of studies related to the domestication of this species have been reported by the writers including seed germination ecology (Yu et al. 2004), seedling establishment techniques (Tai et al. 2008) and seed production (Wei 2010). This paper reports the performance of C. songorica as a turf use grass under drought conditions
    corecore